A Framework for Incorporating General Domain Knowledge into Latent Dirichlet Allocation Using First-Order Logic

نویسندگان

  • David Andrzejewski
  • Xiaojin Zhu
  • Mark Craven
  • Benjamin Recht
چکیده

Topic models have been used successfully for a variety of problems, often in the form of applicationspecific extensions of the basic Latent Dirichlet Allocation (LDA) model. Because deriving these new models in order to encode domain knowledge can be difficult and time-consuming, we propose the Fold·all model, which allows the user to specify general domain knowledge in First-Order Logic (FOL). However, combining topic modeling with FOL can result in inference problems beyond the capabilities of existing techniques. We have therefore developed a scalable inference technique using stochastic gradient descent which may also be useful to the Markov Logic Network (MLN) research community. Experiments demonstrate the expressive power of Fold·all, as well as the scalability of our proposed inference method.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Robust RegBayes: Selectively Incorporating First-Order Logic Domain Knowledge into Bayesian Models

Much research in Bayesian modeling has been done to elicit a prior distribution that incorporates domain knowledge. We present a novel and more direct approach by imposing First-Order Logic (FOL) rules on the posterior distribution. Our approach unifies FOL and Bayesian modeling under the regularized Bayesian framework. In addition, our approach automatically estimates the uncertainty of FOL ru...

متن کامل

Dialog act modeling for virtual personal assistant applications using a small volume of labeled data and domain knowledge

Recently, virtual personal assistant (VPA) applications have been employed in mobile devices, which provide a natural and convenient interface between human and machines. As the VPA services become popular, consumers demand for a wider service than their scope, so the rapid development becomes more important. This paper introduces a dialog act modeling approach for VPA applications, which is an...

متن کامل

Legal Documents Clustering using Latent Dirichlet Allocation

At present due to the availability of large amount of legal judgments in the digital form creates opportunities and challenges for both the legal community and for information technology researchers. This development needs assistance in organizing, analyzing, retrieving and presenting this content in a helpful and distributed manner. We propose an approach to cluster legal judgments based on th...

متن کامل

Bayesian inference for statistical abduction using Markov chain Monte Carlo

Abduction is one of the basic logical inferences (deduction, induction and abduction) and derives the best explanations for our observation. Statistical abduction attempts to define a probability distribution over explanations and to evaluate them by their probabilities. The framework of statistical abduction is general since many well-known probabilistic models, i.e., BNs, HMMs and PCFGs, are ...

متن کامل

دانش، بینش و مدیریت

Model building is one of the most important characteristics of human beings. According to a classification, there are three kinds of models. The first type are omdels which are unique to every individual person and every body has his own models. Another kind of models are those which are general. Theories arise from this kind of models. The third kind are modols for model building and deals wit...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011